Data-Dependent Margin-Based Generalization Bounds for Classification
نویسندگان
چکیده
We derive new margin-based inequalities for the probability of error of classifiers. The main feature of these bounds is that they can be calculated using the training data and therefore may be effectively used for model selection purposes. In particular, the bounds involve empirical complexities measured on the training data (such as the empirical fatshattering dimension) as opposed to their worst-case counterparts traditionally used in such analyses. Also, our bounds appear to be sharper and more general than recent results involving empirical complexity measures. In addition, we develop an alternative data-based bound for the generalization error of classes of convex combinations of classifiers involving an empirical complexity measure that is easier to compute than the empirical covering number or fat-shattering dimension. We also show examples of efficient computation of the new bounds.
منابع مشابه
A QUADRATIC MARGIN-BASED MODEL FOR WEIGHTING FUZZY CLASSIFICATION RULES INSPIRED BY SUPPORT VECTOR MACHINES
Recently, tuning the weights of the rules in Fuzzy Rule-Base Classification Systems is researched in order to improve the accuracy of classification. In this paper, a margin-based optimization model, inspired by Support Vector Machine classifiers, is proposed to compute these fuzzy rule weights. This approach not only considers both accuracy and generalization criteria in a single objective fu...
متن کاملRademacher Complexity Bounds for Non-I.I.D. Processes
This paper presents the first Rademacher complexity-based error bounds for noni.i.d. settings, a generalization of similar existing bounds derived for the i.i.d. case. Our bounds hold in the scenario of dependent samples generated by a stationary β-mixing process, which is commonly adopted in many previous studies of noni.i.d. settings. They benefit from the crucial advantages of Rademacher com...
متن کاملMaximum Relative Margin and Data-Dependent Regularization
Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensitivity to affine transformations of the data and to directions with large data spread. Maximum mar...
متن کاملMargin-based Generalization Error Bounds for Threshold Decision Lists
This paper concerns the use of threshold decision lists for classifying data into two classes. The use of such methods has a natural geometrical interpretation and can be appropriate for an iterative approach to data classification, in which some points of the data set are given a particular classification, according to a linear threshold function (or hyperplane), are then removed from consider...
متن کاملCalibrated Surrogate Losses for Classification with Label-Dependent Costs
We present surrogate regret bounds for arbitrary surrogate losses in the context of binary classification with label-dependent costs. Such bounds relate a classifier’s risk, assessed with respect to a surrogate loss, to its cost-sensitive classification risk. Two approaches to surrogate regret bounds are developed. The first is a direct generalization of Bartlett et al. [2006], who focus on mar...
متن کامل